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Additional Considerations in Detemiining 
Sample Size 
Joel R. Levin and Michael J. Subkoviak 
University of Wisconsin 

ABSTRACT 

Levin's (1975) saii5>le-si2e detemination procedure for con5>letely ran- 
domized analysis of variance designs is extended to designs in \4iich antece- 
dent or blocking variable information is considered. In particular, a re- 
searcher's choice of designs is framed in terms of determining the respective 
sanple sizes necessary to detect specified contrasts of a given magnitude 
with given Type I and Type II errors, A solution is provided for dealing with 
real -world considerations in \4iich errors of measurement caimot be neglected. 
A worked exanple presents an instance wherein a blocking strategy is clearly 
advantageous assuming infallible measuring instruments, but not whoi the same 
instruments are granted fallibility. 
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Additional Considerations in Determining 
Sanple Size 
Joel R. Levin and Michael J. Subkoviak 
University of Wisconsin 

INnODUCTION 

When it comes to designing an experiment, an edijcational researcher can 
draw from a variety of sources--sonie in the foim of old wives' tales, and some 
in the form of theoretically sound recoaimendations {e,g,f Feldt, 19583**to de- 
termine whether it is preferable to assign subjects randomly to K experimen- 
tal conditions and subsequently to perform an analysis of variance on the de* 
pendent variable Y (hereafter referred to as a coTH>letely randomized desigji) ; 
or rather to include in the analysis antecedent information based on variable 
X (known or assumed to be related to Y] . The antecedent infoxmation included 
can be operationally dealt with in various ways: chiefly, in terms of random- 
ized blocks analysis, analysis of covariance, or analysis of an index of re- 
sponse (such as change scores3--cf. Porter § Oiibucos (1974). 

The major advantage of these procedures, relative to the conpletely ran- 
domized design, is one of reducing the within- treatment variability by remov- 
ing the variation in Y that is due to the relationship between X and Y. The 
present paper focuses on one of these procedures, namely the randomized block 
design, as a coin)etitor to the conpletely randomized design; and, in particular, 
it considers an alternative to the traditional way of deciding lAiether to block 
or not to block that includes real -world situations in which errors of measure- 



m^t associated with X, Y, or both are likely to be present. Moreover, siiKe 
the discussion by Porter and Chibucos (1974) suggests that in "true" (Canpbell 
§ Stanley, 1966) experiments of moderate sanple size, analysis of covariance 
and analysis of an index of response may be regarded as essentially equivalent 
procedures to blocking" -within degrees-of- freedom differences and slight dif- 
ferences in their error expected mean squares*-the material presented here 
has iTH>lications for the other two procedures as well. 

Reliability and Sanple Size 

Statistics texts typically acknowledge four ingredients of hypothesis test- 
ing: (a) Type I error probability (a); (b) Type II error probability (6) or 
its coiqjlement , power (1 * S); (c) sanple size, and (d) the magnitude of the ex- 
perimental effect of interest. In planning an ejcperiment, a researcher can 
specify a and the power desired to detect an effect of specified magnitude, and 
subsequently calculate the required sanple size; or, in evaluating a con?>leted 
experiment, the predetermined a level and sain)le size can be used to coin)ute 
the power that was available to detect an effect of given magnitude. 

Such calculations tacitly assume that dependent variables and/or antecedoit 
variables are measured without error, i.e., they are perfectly reliable [ true 
scores] . In actual practice, however, both antecedent and dependent variables 
are measured with error, i.e., they are fallible ( observed scores) , with the 
result tliat "textbook" power/sanple size calculations C^ich do not take the 
unreliability of the observed data into account) produce inaccurate estimates. 
In particular, they produce underestimates of required sanple sizes in the 
planning stage and overestimates of available power in the post hoc evaluation 
condition. The present paper provides fomilas for the confutation of power 
and san^jle size that include the reliability coefficient of observed scores, 
thereby augmenting the list of hypothesis -testing ingredients mentioned above. 
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Several authors have considered the effect of unreliability on statisti- 
cal tests (e.g., Cleary ^ Linn, 1969; Cleary, Linn, § Walster, 1970; Overall 
§ Dalai, 1965; Sutcliffe, 1958; Porter, Note 1). Cleary et al. C1970) , for 
exairple, have demonstrated that the power of the F-test in a one-way, fixed- 
effects analysis of variance (ANDVA) decreases as the re liability- -and also 
as the validity- -of the dependent variable decreases. The purpose of the pre- 
sent paper is to extend some of the Cleary et al. notions to designs in which 
antecedent infoimation is considered; in particular, to the randomized block 
design. Moreover, in contrast to the conmonly recownended strategy for decid- 
ing whether or not it would be advantageous to block {i*e. , by determining the 
relative efficiency of a randomized block design to a conpletely raiKiomized 
design for a fixed number of subjects--cf * Kirk, 1968, pp. 147-149), the strat- 
egy adopted here consists of framing the decision in terms of the respective 
sanple sizes associated with the two designs that are required to yield equiva- 
lent power for detecting specified effects of interest (see, for exanple, Cohen, 
1969, pp. 46-50). 

CASE 1 : LATENT TRUE VARIABLES 

Sanq)le Size Determination for the Gonpletely Randomized Design 

The reader is referred to Levin (1975) for a discussion of sample size 
determination based on a researcher's a priori specification of the xniniinum 
value of any given linear contrast of interest C^ich has been called 4'^) in 
accordance with desired a and 1-S* The resulting number of subjects required 
per experimental condition {n) guarantees the researcher the desired power 
associated with detecting the contrast of interest, should it be of the spec- 
ified magnitude. In the case of a planned -conparison approach to hypothesis 
testing, the F-test is performed with 1 and KCn - 1) degrees of freedom (these 

6 
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referring to the degrees of freedom associated with the contrast and the mean 
square within respectively) ; and in this situation the probability of detect- 
ing a contrast of the magnitude specified is alternatively the probability of 
obtaining a significant F-ratio (both 1 - S). In the case of a post hoc ap- 
proach to hypothesis testing, the F-test is performed with K - 1 and KCn - 1) 
degrees of freedom (where K - 1 represents the degrees of freedom associated 
with the mean scpjare between); and in this situation the probability of detect- 
ing a contrast of the magnitude q>ecified is alternatively the probability of 
obtaining a significant F-ratio and then identifying that contrast as statis- 
tically significant according to Scheff^'s (1953) multiple comparison proce- 
dure (see Levin, 1975). According to this fonnulation, represents the magni- 
tude of the contrast in means considered to be of interest to the researcher, 

and \4iich is expressed in within- treatnsent standard deviation units {o). Thus, 
K 

if y = J a-y. (vSiere the a- represent contrast coefficients chosen such that 



Sample Size Determination for the Randomized Block Design 

Rather than adopting the coiqiletely randomized design, a researcher may 
choose to form n blocks of K subjects {on the basis of some relevant antecedent 
information), and then randomly assign subjects within blocks to the K treat- 
ment conditions. It is well kEiown that the effect of introducing a blocking 
variable into the design is to reduce a by a factor of A. - , \ihere pj^ 
represents the correlation between the antecedent variable and the dependent 
variable. Thus, in terms of the present approach, all that needs to be done 



k-^l 




0 



is to redefine a standardized contrast as ^ 



0 
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effect of blocking, then is to increase the value of of the conq>letely 
randomized design which, if it overccmpensates for the corresponding loss in 
error degrees of freedom, i.e., from K(n - 1) to {K * l)(n - 1), results in a 
decrease in the nuriber of subjects required in order to maintain equivalent 
power to that in the conpletely randomized case. 

CASE 2: FALLIBLE VARIABLES 

The above discussion has proceeded under the assuiqption that the only 

**error** in the ANOVA model consists of subject error. If there is measurement 

error as well, one's effective power will not be as great as one's nominal 

power; or, stated differaitly, a researcher will require more subjects than 

the **textbook" sample size determination indicates are needed in order to have 

the desired power (see, for exan^Jle, Cleary, et al., 1970). Classical test 

theory (Lord § Novick, 1968) assumes that the observed score for person i 

is equal to his or her true score plus measurement error E^, such that 

= + E^. Since and E^ are independently distributed with respective 

2 2 

expected values of \ij and 0 and respective variances of Oj and a^, it fellows 
that: 

CD 

and = 0^ + Og (2) 

The reliability of observed scores Y^ is the ratio of true score variance to 
observed score variance: 

2 2 
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Sanple Size Determination for the Goiiqpletely Randomized Design 

Itow do these properties aKect sairple size determination in the coiqjletely 

raiKiomized design? As was noted previously, ¥^ is singly a contrast involving 

the treatment means which is eocpressed in within- treatment standard deviation 
K 

k^l 

units, or ¥^ = ^ — . Because of the relationship in (1), the numerator of 

is unaffected by measurement errors. What is affected is the denominator- 
TTius, a in ¥^ reflects the within -treatment standard deviation of tne , scores. 



or a». Following Cleary et al. (1970) and enploying (33, we note that in terms 

of observed scores, Oy = j-- . Thus, for the usual case where measuranent 

errors associated with the dependent variable are expected, we siiiply redefine 



^ K 



^ = ^ = ^ *a 

where it may be easily shown Cthough it will not be here) that Pyyt represents 
the (assumed ccMnmcm) wi thin- treatment reliability of the dependent variable. 

SaiTple Size Determination for the Randomized Block Design 

In the case of the randomized block design, the situation becomes conpli- 
cated due to potential errors of measurement associated with X in addition to 
those associated with Y. Employing correction- for- attenuation formulas, one 
can obtain the following general e3q)ression: 



T 



Cvdiere p^^^, represents the reliability of the antecedent variable) . 
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It should be noted that this expression can be easily adapted to fit var- 
ious special cases. In particular^ if only X is assumed to be fallible^ it 
may be seen that: 



On the other hand, if only Y is fallible: 




1 ■ 



Finally, if neither X nor Y is fallible: 




¥ = / iF- ^' = * 

' PXY 



vrfiich is as it should be. 




AN EXAMPLE 

Levin's (1975) sample size determination formula is given by: 



K ^43 



where: (|) = a parameter in the Pearson and Hartley (1951) power charts, 
available in most experimental design textbooks; more conplete 
tables displi^ing <i> are also available {e.g.^ Tiku, 1967, 1972), 

Let us apply (4) to the sin^jlest ANOVA situation, namely for K = 2 lAich is 
equivalent to the Independent two-sanple (nondirectionafl ^-test situation. 
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Assume that a researcher wishes to have an 80 percent chance of detecting 

a difference in K = 2 means of at least 1 standard deviation unit, based on 

a Type I error probability of .05. How many subjects per treatment group 

should he/she include? [With reference to Fomila (4), it should be noted that 

for all cases to be considered, v-, + 1 - 2, which \dll always equal K in the 

K 2 

one-way layout; and a, = 2, which will always be true when only pairwise 

k=l ^ 

differences in means are of interest, even for K > 2 . (However, in some situa- 
tions complex comparisons may interest the researcher, in which case the value 
K 2 

of J a. itfill change—see Levin, 1975.)] 
k=l ^ 

The infoimation contained in the preceding paragraph may be translated as 
follows; a = ,05, 1 - S = .80, ¥^ = 1.00. Incorporating thxs into (4) and 
the appropriate power charts, and proceeding in the manner described by Levin, 
we find that in the conpletely randomized situation Csssuming a perfectly reli- 
able dependent variable] , a total of 17 subjects per treatment group is required 
to yield the desired power. 

If we further assume that an antecedent variable is selected that corre- 
lates .50 with performance on the dependent measure Ci^e., = .50), then it 
* 1 

can be seen that V = — — - = 1.155. Substituting this into [41 and check- 
ing with the appropriate we find that if a randomized block design (assum- 
ing perfectly reliable antecedent and dependent variables^ were employed, a 
total of 14 subjects per treatment group would be required to yield equivalent 
power to that in the coirpletely randomized design above. 

Now let us suppose that either or both of the two variables involved 
(antecedent and dependent) are fallible • Given separate {and equal) reliabil- 
ities of PjQ(f= exaii5>le, we are able to retrace the steps asso- 
ciated with (4), incorporating ¥ and ^ as previously defined. Table 1 sunmarizes 

Insert Table 1 about here 
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the results of this endeavor. 

What is especially interesting about this particular exanple is that ev^ 
though we start out with a situation in which it is clearly preferable to block 
(as reflected by a total savings of six subjects for Situation 1 of Table 1), 
by the time the antecedent and dependent variables are both granted fallibility 
on the order of p^, = p^y, .80, the randomized block advantage disappears 
(as reflected by the 0 total subject savings difference in Situation 4 uf 
Table 1) . 

To make this lesson somewhat more concrete, assume that a researcher is in- 
terested in con5)aring the efficacy of two instructional variations designed to 
teach eighth grade mathematics. Both variations are to be incorporated into 
programed instruction booklets and randomly assigned to students within class- 
rooms or schools) , and end-of-year performance will be assessed via a standard- 
ized mathematics achievement test. Sii^rpose further in this hypothetical situa- 
tion that the production cost of the booklets is somev^t of a factor, so that 
an experimental design that will yield the desired power with the fewest stu- 
dents is the one to be selected. Given this information, should the researcher 
randcmily assign students to the two treatment conditions or block on sevmth 
grade standardized mathematics achievement scores, known to correlated .50 with 
eighth grade scores? Ignoring the unreliability associated with two achievement 
tests {as in the "textbook" case) , the researcher would clearly do well to 
block; he would require six fewer students with a randomized block design than 
with a COTipletely randomized design- However, considering the published relia- 
bilities of the two tests of .80, the researcher would discover that it makes 
little difference which of the two experimental designs he selects, since there 
is a 0 subject savings. In fact, if it would require some additional effort to 
obtain and/or record the seventh grade achievement data the researcher may well 
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opt for the seemingly less efficient Cthough not so in this case) coi)^)letely 
randomized design. 

OONOAJSION 

This particular exanple is but one of several that could have been 
contrived- What should be clear to the reader, based on this example and the 
larger message of this paper^ is as follows: Firsts each potential experiment 
should be examined on an a priori basis to determine Aether or not it is ad- 
vantageous to block. This decision cannot be made without considering the 
number of treatment conditions included > the magnitude of the relationship be- 
tween the antecedent and blocking variables CPjjy) > as well as the various hy- 
pothesis-testing ingredients described at the outset of the paper. Second^ 
to follow these procedures without simultaneously considering errors of measure- 
ment is to live in a **fool's paradise/' for these too will affect block-no 
block decisions. In cases where a priori reliability infoxmation is lacking, 
pilot research or sagacious judgments (to obtain approximate and conservative 
estimates, respectively) will surely do better than nothing. 
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Sittiation 

1. X is Infallible, 

Y is Infallible 

CR 
RB 

2. X is Infallible, 

Y is Fallible (Pyyt = -80) 

CR 
RB 

3. X is Fallible CPjQQt = -80), 

Y is Infallible 

CR 
RB 

4. X is Fallible (Pxxt -^O) » 

Y is Fallible (Pyy' = -^O) 

CR 
RB 



Ccr5>arison of Conpletely Randomized (CR) and Randomized 
Block (RB) Design Sample Sizes for the Present Exanple 
CK = 2, a = .05, 1 - B = .80, = 1.00, = .05) 



or Equivalent 



1.000 
1.155 



.894 
.989 



1.000 
1.106 



.894 
.931 



Number of 
Subjects Per Groig) 



17 
14 



21 
18 



17 
15 



21 
21 



Total Subject 
Savings (}(B - CR) 



